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Abstract — This paper addresses the problem of segmenting 
a time-series with respect to changes in the mean value or in 
the variance. The first case is when the time data is modeled 
as a sequence of independent and normal distributed random 
variables with unknown, possibly changing, mean value but 
fixed variance. The main assumption is that the mean value 
is piecewise constant in time, and the task is to estimate the 
change times and the mean values within the segments. The 
second case is when the mean value is constant, but the variance 
can change. The assumption is that the variance is piecewise 
constant in time, and we want to estimate change times and the 
variance values within the segments. To find solutions to these 
problems, we will study an /[ regularized maximum likelihood 
method, related to the fused lasso method and l\ trend filtering, 
where the parameters to be estimated are free to vary at each 
sample. To penalize variations in the estimated parameters, the 
l\ -norm of the time difference of the parameters is used as a 
regularization term. This idea is closely related to total variation 
denoising. The main contribution is that a convex formulation 
of this variance estimation problem, where the parametrization 
is based on the inverse of the variance, can be formulated 
as a certain l\ mean estimation problem. This implies that 
results and methods for mean estimation can be applied to the 
challenging problem of variance segmentation/estimation. 
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I. Introduction 

The problem of estimating the mean, trends and variances 
in time series data is of fundamental importance in signal 
processing and in many other disciplines such as processing 
of financial and biological data. This is typically done to 
preprocess data before estimating for example parametric 
models. For non-stationary data it is also important to be able 
to detect changes in mean and variances and segment the data 
into stationary subsets. A classical way is to use windowing 
to handle time-variations, by for example subtracting the 
windowed sample mean estimate from the data or scaling 
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the data with a windowed estimate of the variance. More 
advanced detection and segmentation methods are often 
based on probabilistic models, such as Markov models, and 
have a lot of tuning and user choices, [8]. An alternative 
way to approach this problem is to use regularization to 
penalize variations and changes in the estimated parameter 
vector. Recently, there has been a lot of efforts on applying 
Zi-norm regularization in estimation in order to obtain convex 
optimization problems, [9], [5]. Our work is inspired by 
the l\ trend filtering method in [10] and the fused lasso 
method, [13]. The l\ trend filtering method considers changes 
in the mean value of the data. Here we are also interested 
in changes in the variance. This problem is closely related 
to the covariance selection problem introduced in [4]. The 
paper [1] formulates this as a convex optimization problem 
by using the inverse of the covariance matrix as parameter, 
see also [10]. This idea is also used in the graphical lasso 
method, [6]. 

The paper is organized as follows. In Section[jl]the general 
problem formulation is specified. Section III considers the 
special case of mean estimation, while Section [IV] deals with 
variance estimation and its relation to mean estimation. Sec- 
tion [V] contains a numerical example of variance estimation, 
and Section |VT] discusses the extension to the multi-variate 



case. The paper is concluded in Section VII 



II. Problem Statement 

Consider the independent scalar sequence {y t } which 
satisfies 

y, ~^{m,,(J?), 

where both the mean {m t } and the variance {of} are 
(unknown) piecewise constant sequences. Assume that the 
measurements {y\ ■ ■ ■ y^} are available, and we are inter- 
ested in estimating mi,... ,my and of, . . . , o^. 

To solve this problem, first notice that the model y t ~ 
jV(m t ^Ot) is a standard exponential family with canonical 
parameters, [3, Example 1.2], 



\i t := m,/of e M, r),:=-l/2ff, 2 eK" 

This means that the log-likelihood of {jUi,. 
T}i, ...,%} given {yi ■■■ y N } is 
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Moreover, by [3, Theorem 1.13] it follows that / is strictly 
concave on 

{Qii,...,fi N ,T}i,...,ri N ) : m el, 77, eK", r = 1,...,AT}. 

Assumption: The prior knowledge is that the sequences 
{m t } and {<7 r 2 } are piecewise constant in time. This means 
that the difference sequences {m t+ \ —m t } and {of +i — a, 2 } 
are sparse. 

Inspired by [10] we propose an estimator based on the 
solution of the following optimization problem: 

-ln(-TJr) Mr 2 
2 4ti, 
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This a convex optimization problem where the cost function 
is separable, plus two terms that are separable in the dif- 
ference between consecutive variables. The /i-norm is used 
to penalize non-sparse solutions, while still having a convex 
objective function. Standard software for convex optimiza- 
tion can be used to solve ([TJ. However, it is possible to use 
the special structure to derive more efficient optimization 
algorithms for ([TJ, see [2], [14]. 

III. Mean Estimation 

Consider the problem of mean segmentation under the 
assumption that the variance is constant. The optimization 
problem ([TJ then simplifies to 

minimize V (mi ,m#), (2) 

where 



V (m\ 
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(3) 



The f:th element of the sub-differential of the cost function 
equals 

[dV{m\, m N )) t = y t - m, + X\ [sign(ra, - m f _ 1 ) 

-sign(m f+ i-m t )], 2<t <N-l, (4) 

where 



= -1, x<0, 
sign(x) { e[-l,l], x = 0, 
= 1, x>0. 



(5) 



For t — 1 and t = N obvious modifications have to be done 
to take the initial and end conditions into account. Using the 
incremental structure of the sub-differential, it makes sense 
to add up the expressions^ to obtain 

k k 

Y 1 [dV(m 1 ,...,m N )] t = £ (y,-m, )-Aisign(mfc + i -m k ), 
r=i t=i 

1 < k < N, 

N N 

^ [dV (m u ..., m N )] t = (yt - mt). 
t=i t=\ 



This is more or less the sub-differential with respect to 
the variables r\ = m\, = — m^_i, k = 2,...,N. For 
optimality the sub-gradient should include zero, which leads 
to the optimality conditions 

k 

^ ~ m >) = ^i si g n ( m /t+i - m k) 

Y(y t -m t )=0. 

t=i 

The "empirical mean" 



1 < k < N, (6) 
(7) 
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obtained from Q, satisfies also the first N — I optimality 
conditions (|6| if 



k k N 

t=i iv t=\ 



<Ai, 1 <k<N. 



Here we have used that sign(0) € [—1,1]. This is the case 
if Ai is large enough. Since the optimization problem Q 
is convex, the sub-differential condition is necessary and 
sufficient. Hence we have now derived the Amax result, [10] 
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Then, mi = ■ ■ ■ = mm = m is the optimal solution of Q if 
and only if Ai > [Ai] max . The expression for [Ai] max is more 
obvious by diving |9]l by k, 
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"• t=\ J ' r=l 

Hence, we compare the empirical means for the sequences 
of length k = 1, . . . ,N — 1 with m, and then relate A to the 
maximum deviation. 

The A max result is very useful in order to find a good 
choice of A, and also to derive efficient numerical solvers. 

IV. Variance Estimation 

We will now study the variance estimation problem under 
the assumption that the mean values are known. We can, 
without losing generality, assume that m t — 0. For this special 
case the optimization problem ([1} equals 

minimize W(r]i,... ,t]n), (10) 
subject to 77, < 0, t = l,...,N, 

where 

M-rit) 
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We will now show that ( [T0| is equivalent, in the sense of 
having related optimal solution, to the optimization problem 

minimize V(of,..., 0%), (12) 



subject to a t > 0, t=l,...,N, 



where 



V(af, ...,<%) = \ t (J, 2 - ct, 2 ) 2 + A 2 £ |of - a-U (13) 



r=2 



and of = — 1/(2)7, ) i s tne variance. Now 



(14) 



which means that the sign of the differences is not affected 
by the transformation of = — \/{2r\t). This will be critical 
in deriving the equivalence result. The formulation ( p"2] > also 
makes sense from a practical point of view, since the variance 
of y, (in the zero mean case) is the mean of yj. Notice, 
however, that this is not directly obvious from the log- 
likelihood, but is often used in signal processing under the 
name of covariance fitting, [12]. 

We now have the following main result: 



Theorem 1. The convex optimization problems ( | 1 0| > and 
( [12) , with of = — 1/(2tJ;), have the same sub-gradient 
optimality conditions. 

Proof: First notice that 



/V 

£ 

f=i 



dL[-j-W-rit)-n,y} 



-i 

2tt7 



^2 2 



■ t=l 



Next, for 2 < t < N 



f=2 



= sign(m, - m,_ [ ) - sign(m r+1 - m t ) 



sign(o ( " - of_ i ) - sign(of +1 - of 



<?£|of-of 



t=2 



Here we have used that the sign function defined by ( |T4] > only 
depends on the sign of its argument and ( fl4-| > implies that the 
sign is not changed by the transformation of = — 1/(2t] ; ). 

Q.E.D. 

Since both optimization problems ( fT0| ) and ( fT2) are con- 
vex, Theorem 1 implies that we can re-use algorithms and 
results for the mean estimation problem to the variance 
estimation problem ( |12) . For example, it directly follows that 



[A2]max = max 

k=\ N-l 



k k N 

r=l JV r=l 



and for A2 > [A2]max the constant "empirical variance" solu- 
tion 
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are the optimal solutions to ( |T0] > and (JT3J, respectively. From 
a practical point of view one has to be a bit careful when 
squaring y t since outliers are amplified. 



V. Example 

Consider a signal {y t } which satisfies y t ~ ^(0, of), 
where {of} is a piece-wise constant sequence: 

2, if < t < 250, 
1, if250<r<500, 

3, if 500 < t < 750, 
1. if 750 < t < 1000. 



Given 1000 measurements {yi,-- - ,yiooo}, we want to es- 
timate the variances of, . . . , O 2 000 . To solve problem ( fT2") i 
we used CVX, a package for specifying and solving con- 
vex programs [7]. Figure [1] shows the resulting estimates 
of {of}, the true values of {of} and the measurements 
{yi,'~ ^iooo}- 
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Fig. 1. Estimated variance (black line), true variance (blue line) and 
measurements (red crosses). 



VI. Two Extensions 

A. Simultaneous Mean and Variance Estimation 

The general mean and variance optimization problem ([T]) is 
convex and it is possible to find Amax expressions. A difficulty 
is the term jif /Ar\ t in |IJ that couples the mean and variance 
optimization. It is also non-trivial to tune this algorithm in 
the sense that it is difficult to separate a change in mean 
from a change in variance based on short data records. 

B. The Multi-Variate Case 

Assume that the process y ( ~ yy(m t ,T,,) £ K", that is the 
mean m, 6 R" and the covariance matrix E f £ K" x ". The 
canonical parameters are 

The corresponding l\ regularized maximum log-likelihood 
estimation problem is 



minimize 

Hi,...,Hn 
h 1 <o,...h a ,<o 



-1 



logde^-H^-^HrW} 



-Tr{H,y v y, r }-T7 r r y,) 

N N ~) 

+Aj £ -ftlb + ^L ||H*+i -H,||f \ 

i=2 i=2 J 

where we have used the Euclidean vector norm and Frobe- 
nius matrix norm. This is a convex optimization problem 



with a large number of unknowns, n + (n + l)n/2 per n 
dimension sample y t . 

A problem when trying to generalize the results on the 
equivalence of variance estimation and mean estimation of 
y t yj is that the ordering relation 

H (+ i -H, = -Z f _1 [Z t+ i - ErjE^j 
does not holds componentwise. Still, the convex problem 

N N 

minimize £ ||y ( yf - L, \\ F + X 2 £ ||£f+i - %t \\f 

El>0,...,Z„>0 i=l i=2 

makes sense as a covariance matrix fitting problem. 

VII. Conclusions 

The objective of this contribution has been to introduce 
the concept of l\ variance filtering and relate this approach 
to the problem of l\ mean filtering. The advantage of 
the l\ approach is that there are only one or two design 
parameters (Ai and X£), while classical approaches involve 
more user design variables such as thresholds and transition 
probabilities. The framework presented can also be used 
for more advanced time-series model estimations such as 
autoregressive models, see [11]. The number of variables in 
the multi-variate problem can be huge. Tailored algorithms 
for this problem based on the alternating direction method 
of multipliers algorithm, [2], have recently been proposed 
in [14]. 
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